Goto

Collaborating Authors

 efficient second-order approximation


WoodFisher: Efficient Second-Order Approximation for Neural Network Compression

Neural Information Processing Systems

Second-order information, in the form of Hessian-or Inverse-Hessian-vector products, is a fundamental tool for solving optimization problems. Recently, there has been significant interest in utilizing this information in the context of deep neural networks; however, relatively little is known about the quality of existing approximations in this context. Our work considers this question, examines the accuracy of existing approaches, and proposes a method called WoodFisher to compute a faithful and efficient estimate of the inverse Hessian. Our main application is to neural network compression, where we build on the classic Optimal Brain Damage/Surgeon framework. We demonstrate that WoodFisher significantly outperforms popular state-of-the-art methods for one-shot pruning. Further, even when iterative, gradual pruning is allowed, our method results in a gain in test accuracy over the state-of-the-art approaches for popular image classification datasets such as ImageNet ILSVRC. Further, we show how our method can be extended to take into account first-order information, and illustrate its ability to automatically set layer-wise pruning thresholds, or perform compression in the limited-data regime.


Review for NeurIPS paper: WoodFisher: Efficient Second-Order Approximation for Neural Network Compression

Neural Information Processing Systems

Weaknesses: --- Missing details about lambda While mentioned line 138, the dampening parameter lambda does not appear in the experimental section of the main body, and I only found a value 1e-5 in the appendix (l799). How do you select its value? I expect your final algorithm be very sensitive to lambda, since \delta_L as defined in eq.4 will select directions with smallest curvature. Another comment about lambda is that if you set it to a very large value k, then its becomes dominant compared to the eigenvalues of F, then your technique basically amounts to magnitude pruning. In that regards, it means that MP is just a special case of your technique, when using a large dampening value.


Review for NeurIPS paper: WoodFisher: Efficient Second-Order Approximation for Neural Network Compression

Neural Information Processing Systems

The focus of the submission is training neural networks using 2nd-order information. Particularly, the goal of the work is the approximation of the inverse of the empirical Fisher matrix as it is defined in the displayed equation under (1). The authors notice that the empirical Fisher is an average of diads (a x a T where T denotes transposition) hence its inverse can be recursively computed by the Woodbury matrix identity. The resulting inverse is applied for pruning of convolutional neural networks (CNNs) and is compared against other unstructured pruning methods. Training and pruning neural networks are central problems of machine learning.


WoodFisher: Efficient Second-Order Approximation for Neural Network Compression

Neural Information Processing Systems

Second-order information, in the form of Hessian- or Inverse-Hessian-vector products, is a fundamental tool for solving optimization problems. Recently, there has been significant interest in utilizing this information in the context of deep neural networks; however, relatively little is known about the quality of existing approximations in this context. Our work considers this question, examines the accuracy of existing approaches, and proposes a method called WoodFisher to compute a faithful and efficient estimate of the inverse Hessian. Our main application is to neural network compression, where we build on the classic Optimal Brain Damage/Surgeon framework. We demonstrate that WoodFisher significantly outperforms popular state-of-the-art methods for one-shot pruning.